Due: October 22nd (Wednesday, End of Day)
This R Markdown performs an exploratory data analysis (EDA) for the
dataset Hospital_Inpatient_Discharges.csv. It contains
code, explanation, dataset summary, descriptive statistics, graphical
displays, variance/SD measures, normality checks, and initial
statistical tests.
## Dataset dimensions: Rows: 21075 Columns: 34
## Rows: 21,075
## Columns: 34
## $ Health.Service.Area <chr> "Western NY", "Western NY", "Weste…
## $ Hospital.County <chr> "Niagara", "Niagara", "Niagara", "…
## $ Operating.Certificate.Number <int> 3101000, 3101000, 3101000, 3101000…
## $ Facility.ID <int> 565, 565, 565, 565, 565, 565, 565,…
## $ Facility.Name <chr> "Eastern Niagara Hospital - Lockpo…
## $ Age.Group <chr> "18 to 29", "18 to 29", "0 to 17",…
## $ Zip.Code...3.digits <chr> "140", "140", "140", "140", "140",…
## $ Gender <chr> "M", "F", "F", "M", "F", "F", "M",…
## $ Race <chr> "White", "White", "White", "White"…
## $ Ethnicity <chr> "Not Span/Hispanic", "Not Span/His…
## $ Length.of.Stay <chr> "1", "2", "2", "5", "2", "2", "2",…
## $ Type.of.Admission <chr> "Emergency", "Emergency", "Newborn…
## $ Patient.Disposition <chr> "Home or Self Care", "Home or Self…
## $ Discharge.Year <int> 2012, 2012, 2012, 2012, 2012, 2012…
## $ CCS.Diagnosis.Code <int> 7, 190, 218, 120, 195, 188, 130, 2…
## $ CCS.Diagnosis.Description <chr> "Viral infection", "Fetal distress…
## $ CCS.Procedure.Code <int> 4, 137, 228, 76, 137, 134, 39, 115…
## $ CCS.Procedure.Description <chr> "DIAGNOSTIC SPINAL TAP", "OT PRCS …
## $ APR.DRG.Code <int> 723, 560, 640, 254, 560, 540, 194,…
## $ APR.DRG.Description <chr> "Viral illness", "Vaginal delivery…
## $ APR.MDC.Code <int> 18, 14, 15, 6, 14, 14, 5, 15, 6, 1…
## $ APR.MDC.Description <chr> "Infectious and Parasitic Diseases…
## $ APR.Severity.of.Illness.Code <int> 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2…
## $ APR.Severity.of.Illness.Description <chr> "Minor", "Minor", "Minor", "Minor"…
## $ APR.Risk.of.Mortality <chr> "Minor", "Minor", "Minor", "Modera…
## $ APR.Medical.Surgical.Description <chr> "Medical", "Medical", "Medical", "…
## $ Payment.Typology.1 <chr> "Self-Pay", "Medicaid", "Medicaid"…
## $ Payment.Typology.2 <chr> "", "", "", "Medicare", "", "", ""…
## $ Payment.Typology.3 <chr> "", "", "", "", "", "", "", "", ""…
## $ Birth.Weight <int> 0, 0, 3100, 0, 0, 0, 0, 2900, 0, 0…
## $ Abortion.Edit.Indicator <chr> "N", "N", "N", "N", "N", "N", "N",…
## $ Emergency.Department.Indicator <chr> "Y", "N", "N", "Y", "N", "N", "N",…
## $ Total.Charges <dbl> 4334.14, 2076.62, 1638.00, 9927.27…
## $ Total.Costs <dbl> 1560.46, 2221.43, 1982.04, 4933.30…
Here I check if there is a missing value or not. For each column check the missing value.
## # A tibble: 34 × 2
## `column(variable)` n_missing
## <chr> <int>
## 1 Health.Service.Area 0
## 2 Hospital.County 0
## 3 Operating.Certificate.Number 0
## 4 Facility.ID 0
## 5 Facility.Name 0
## 6 Age.Group 0
## 7 Zip.Code...3.digits 0
## 8 Gender 0
## 9 Race 0
## 10 Ethnicity 0
## # ℹ 24 more rows
## No missing values detected. Dataset is clean.
Here I check to remove duplicates:
## Number of duplicate rows: 4
## Numeric Columns summary:
| Name | num_vars |
| Number of rows | 21071 |
| Number of columns | 11 |
| _______________________ | |
| Column type frequency: | |
| numeric | 11 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Operating.Certificate.Number | 0 | 1 | 2880548.75 | 579505.37 | 1401014.00 | 3101000.00 | 3102000.00 | 3121001.00 | 3121001.0 | ▁▁▁▁▇ |
| Facility.ID | 0 | 1 | 576.23 | 7.07 | 565.00 | 574.00 | 574.00 | 583.00 | 585.0 | ▅▁▇▃▇ |
| Discharge.Year | 0 | 1 | 2012.00 | 0.00 | 2012.00 | 2012.00 | 2012.00 | 2012.00 | 2012.0 | ▁▁▇▁▁ |
| CCS.Diagnosis.Code | 0 | 1 | 218.95 | 191.22 | 2.00 | 108.00 | 153.00 | 218.00 | 670.0 | ▇▇▁▁▂ |
| CCS.Procedure.Code | 0 | 1 | 96.46 | 88.59 | 0.00 | 0.00 | 86.00 | 202.00 | 231.0 | ▇▂▂▂▆ |
| APR.DRG.Code | 0 | 1 | 406.24 | 245.63 | 4.00 | 198.00 | 321.00 | 640.00 | 952.0 | ▅▇▃▅▂ |
| APR.MDC.Code | 0 | 1 | 10.13 | 6.11 | 1.00 | 5.00 | 8.00 | 15.00 | 25.0 | ▇▇▅▅▁ |
| APR.Severity.of.Illness.Code | 0 | 1 | 1.97 | 0.85 | 1.00 | 1.00 | 2.00 | 2.00 | 4.0 | ▆▇▁▃▁ |
| Birth.Weight | 0 | 1 | 174.99 | 739.64 | 0.00 | 0.00 | 0.00 | 0.00 | 4900.0 | ▇▁▁▁▁ |
| Total.Charges | 0 | 1 | 12421.15 | 13878.88 | 423.98 | 5084.56 | 8267.16 | 14651.49 | 294515.6 | ▇▁▁▁▁ |
| Total.Costs | 0 | 1 | 6607.40 | 7378.03 | 103.31 | 2634.34 | 4378.96 | 7826.79 | 187205.7 | ▇▁▁▁▁ |
## Categorical Columns Summary (first 5 columns):
## Column: Health.Service.Area
## Western NY
## 21071
##
## Column: Hospital.County
## Niagara
## 21071
##
## Column: Facility.Name
##
## Niagara Falls Memorial Medical Center
## 6487
## Mount St Marys Hospital and Health Center
## 5587
## Eastern Niagara Hospital - Lockport Division
## 4553
## Degraff Memorial Hospital
## 2802
## Eastern Niagara Hospital - Newfane Division
## 1642
##
## Column: Age.Group
##
## 70 or Older 50 to 69 30 to 49 18 to 29 0 to 17
## 7523 5974 3852 2369 1353
##
## Column: Zip.Code...3.digits
##
## 143 140 141 142 OOS
## 8077 6436 5261 870 189 160
##
## To examine whether elective admissions are associated with longer hospital stays than emergency admissions using SPARCS 2012 Niagara County data.
##
## Why it matters:
## Hospital efficiency and patient-flow optimization rely on understanding LOS variation.
| los | admission_type | age_group | gender |
|---|---|---|---|
| 1 | Emergency | 18 to 29 | M |
| 2 | Emergency | 18 to 29 | F |
| 5 | Emergency | 70 or Older | M |
| 2 | Elective | 18 to 29 | F |
| 2 | Elective | 0 to 17 | F |
| 2 | Elective | 70 or Older | M |
## los admission_type age_group gender
## Min. : 1.00 Elective : 4145 0 to 17 : 196 F:11133
## 1st Qu.: 2.00 Emergency :15083 18 to 29 :2161 M: 8095
## Median : 3.00 Newborn : 0 30 to 49 :3662
## Mean : 5.41 Not Available: 0 50 to 69 :5826
## 3rd Qu.: 6.00 Urgent : 0 70 or Older:7383
## Max. :112.00
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 246.15 < 2.2e-16 ***
## 19226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] Emergency Elective
## Levels: Elective Emergency
##
## Welch Two Sample t-test
##
## data: los by admission_type
## t = 12.229, df = 5290.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Elective and group Emergency is not equal to 0
## 95 percent confidence interval:
## 1.405815 1.942575
## sample estimates:
## mean in group Elective mean in group Emergency
## 6.723522 5.049327
## Elective mean LOS: 6.72 days
## Emergency mean LOS: 5.05 days
## Difference in means: 1.67 days
## Elective stays are 24.9 % longer than Emergency stays.
## Cohen's d | 95% CI
## ------------------------
## 0.26 | [0.23, 0.30]
##
## - Estimated using pooled SD.
##
## Welch Two Sample t-test
##
## data: log_los by admission_type
## t = 8.2019, df = 5730.7, p-value = 2.894e-16
## alternative hypothesis: true difference in means between group Elective and group Emergency is not equal to 0
## 95 percent confidence interval:
## 0.08261313 0.13450816
## sample estimates:
## mean in group Elective mean in group Emergency
## 1.673387 1.564826
##
## Call:
## lm(formula = los ~ admission_type + age_group + gender, data = dfq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.615 -3.411 -1.846 0.994 106.994
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.61938 0.46468 12.093 < 2e-16 ***
## admission_typeEmergency -1.89652 0.11518 -16.466 < 2e-16 ***
## age_group18 to 29 0.41086 0.47567 0.864 0.3877
## age_group30 to 49 0.79208 0.46696 1.696 0.0899 .
## age_group50 to 69 0.95356 0.46249 2.062 0.0392 *
## age_group70 or Older 1.28267 0.46099 2.782 0.0054 **
## genderM 0.71278 0.09367 7.609 2.89e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.368 on 19221 degrees of freedom
## Multiple R-squared: 0.01648, Adjusted R-squared: 0.01617
## F-statistic: 53.67 on 6 and 19221 DF, p-value: < 2.2e-16
##
## • Emergency patients stay significantly SHORTER than elective patients.
## • Observed mean ratio (Emergency/Elective) ≈ 0.751 .
## • Welch t-test: p-value from t_res = 6.2e-34 .
## • Effect size is small-to-moderate (report the actual Cohen's d below).
## • Implication: elective (often surgical) cases drive longer LOS; plan capacity accordingly.
Statistical Interpretation:
The results show that elective admissions have significantly longer hospital stays than emergency admissions in Niagara County based on the SPARCS 2012 data. The null hypothesis assumed there was no difference in average stay between the two admission types, while the alternative stated that elective admissions stay longer. The findings reject the null hypothesis, with elective patients staying about 1.7 days more on average (6.7 vs 5.0 days, p < 0.001). The small effect size indicates the difference, though statistically meaningful, is moderate in practice. This pattern makes sense because elective cases often involve planned surgeries and recovery periods, whereas emergency cases are treated and discharged more quickly.